text-to-image model
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Israel (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States (0.14)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Africa > Namibia (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.94)
- North America > United States > California > Santa Clara County > Palo Alto (0.15)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Information Technology (0.92)
- Law > Intellectual Property & Technology Law (0.68)
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)
- North America > United States > California > Santa Clara County > Palo Alto (0.15)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Information Technology (0.92)
- Law > Intellectual Property & Technology Law (0.68)
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)
- Asia > China > Hong Kong (0.04)
- North America > Dominican Republic (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (2 more...)
- Government > Military (0.93)
- Information Technology > Security & Privacy (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users'
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality.